34 research outputs found

    ModĂ©lisation des donnĂ©es d'enquĂȘtes cas-cohorte par imputation multiple (Application en Ă©pidĂ©miologie cardio-vasculaire.)

    Get PDF
    Les estimateurs pondĂ©rĂ©s gĂ©nĂ©ralement utilisĂ©s pour analyser les enquĂȘtes cas-cohorte ne sont pas pleinement efficaces. Or, les enquĂȘtes cas-cohorte sont un cas particulier de donnĂ©es incomplĂštes oĂč le processus d'observation est contrĂŽlĂ© par les organisateurs de l'Ă©tude. Ainsi, des mĂ©thodes d'analyse pour donnĂ©es manquant au hasard (MA) peuvent ĂȘtre pertinentes, en particulier, l'imputation multiple, qui utilise toute l'information disponible et permet d'approcher l'estimateur du maximum de vraisemblance partielle.Cette mĂ©thode est fondĂ©e sur la gĂ©nĂ©ration de plusieurs jeux plausibles de donnĂ©es complĂ©tĂ©es prenant en compte les diffĂ©rents niveaux d'incertitude sur les donnĂ©es manquantes. Elle permet d'adapter facilement n'importe quel outil statistique disponible pour les donnĂ©es de cohorte, par exemple, l'estimation de la capacitĂ© prĂ©dictive d'un modĂšle ou d'une variable additionnelle qui pose des problĂšmes spĂ©cifiques dans les enquĂȘtes cas-cohorte. Nous avons montrĂ© que le modĂšle d'imputation doit ĂȘtre estimĂ© Ă  partir de tous les sujets complĂštement observĂ©s (cas et non-cas) en incluant l'indicatrice de statut parmi les variables explicatives. Nous avons validĂ© cette approche Ă  l'aide de plusieurs sĂ©ries de simulations: 1) donnĂ©es complĂštement simulĂ©es, oĂč nous connaissions les vraies valeurs des paramĂštres, 2) enquĂȘtes cas-cohorte simulĂ©es Ă  partir de la cohorte PRIME, oĂč nous ne disposions pas d'une variable de phase-1 (observĂ©e sur tous les sujets) fortement prĂ©dictive de la variable de phase-2 (incomplĂštement observĂ©e), 3) enquĂȘtes cas-cohorte simulĂ©es Ă  partir de la cohorte NWTS, oĂč une variable de phase-1 fortement prĂ©dictive de la variable de phase-2 Ă©tait disponible. Ces simulations ont montrĂ© que l'imputation multiple fournissait gĂ©nĂ©ralement des estimateurs sans biais des risques relatifs. Pour les variables de phase-1, ils approchaient la prĂ©cision obtenue par l'analyse de la cohorte complĂšte, ils Ă©taient lĂ©gĂšrement plus prĂ©cis que l'estimateur calibrĂ© de Breslow et coll. et surtout que les estimateurs pondĂ©rĂ©s classiques. Pour les variables de phase-2, l'estimateur de l'imputation multiple Ă©tait gĂ©nĂ©ralement sans biais et d'une prĂ©cision supĂ©rieure Ă  celle des estimateurs pondĂ©rĂ©s classiques et analogue Ă  celle de l'estimateur calibrĂ©. Les rĂ©sultats des simulations rĂ©alisĂ©es Ă  partir des donnĂ©es de la cohorte NWTS Ă©taient cependant moins bons pour les effets impliquant la variable de phase-2 : les estimateurs de l'imputation multiple Ă©taient lĂ©gĂšrement biaisĂ©s et moins prĂ©cis que les estimateurs pondĂ©rĂ©s. Cela s'explique par la prĂ©sence de termes d'interaction impliquant la variable de phase-2 dans le modĂšle d'analyse, d'oĂč la nĂ©cessitĂ© d'estimer des modĂšles d'imputation spĂ©cifiques Ă  diffĂ©rentes strates de la cohorte incluant parfois trop peu de cas pour que les conditions asymptotiques soient rĂ©unies.Nous recommandons d'utiliser l'imputation multiple pour obtenir des estimations plus prĂ©cises des risques relatifs, tout en s'assurant qu'elles sont analogues Ă  celles fournies par les analyses pondĂ©rĂ©es. Nos simulations ont Ă©galement montrĂ© que l'imputation multiple fournissait des estimations de la valeur prĂ©dictive d'un modĂšle (C de Harrell) ou d'une variable additionnelle (diffĂ©rence des indices C, NRI ou IDI) analogues Ă  celles fournies par la cohorte complĂšteThe weighted estimators generally used for analyzing case-cohort studies are not fully efficient. However, case-cohort surveys are a special type of incomplete data in which the observation process is controlled by the study organizers. So, methods for analyzing Missing At Random (MAR) data could be appropriate, in particular, multiple imputation, which uses all the available information and allows to approximate the partial maximum likelihood estimator.This approach is based on the generation of several plausible complete data sets, taking into account all the uncertainty about the missing values. It allows adapting any statistical tool available for cohort data, for instance, estimators of the predictive ability of a model or of an additional variable, which meet specific problems with case-cohort data. We have shown that the imputation model must be estimated on all the completely observed subjects (cases and non-cases) including the case indicator among the explanatory variables. We validated this approach with several sets of simulations: 1) completely simulated data where the true parameter values were known, 2) case-cohort data simulated from the PRIME cohort, without any phase-1 variable (completely observed) strongly predictive of the phase-2 variable (incompletely observed), 3) case-cohort data simulated from de NWTS cohort, where a phase-1 variable strongly predictive of the phase-2 variable was available. These simulations showed that multiple imputation generally provided unbiased estimates of the risk ratios. For the phase-1 variables, they were almost as precise as the estimates provided by the full cohort, slightly more precise than Breslow et al. calibrated estimator and still more precise than classical weighted estimators. For the phase-2 variables, the multiple imputation estimator was generally unbiased, with a precision better than classical weighted estimators and similar to Breslow et al. calibrated estimator. The simulations performed with the NWTS cohort data provided less satisfactory results for the effects where the phase-2 variable was involved: the multiple imputation estimators were slightly biased and less precise than the weighted estimators. This can be explained by the interactions terms involving the phase-2 variable in the analysis model and the necessity of estimating specific imputation models in different strata not including sometimes enough cases to satisfy the asymptotic conditions. We advocate the use of multiple imputation for improving the precision of the risk ratios estimates while making sure they are similar to the weighted estimates.Our simulations also showed that multiple imputation provided estimates of a model predictive value (Harrell's C) or of an additional variable (difference of C indices, NRI or IDI) similar to those obtained from the full cohort.PARIS11-SCD-Bib. Ă©lectronique (914719901) / SudocSudocFranceF

    Multiple imputation for estimating hazard ratios and predictive abilities in case-cohort surveys

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The weighted estimators generally used for analyzing case-cohort studies are not fully efficient and naive estimates of the predictive ability of a model from case-cohort data depend on the subcohort size. However, case-cohort studies represent a special type of incomplete data, and methods for analyzing incomplete data should be appropriate, in particular multiple imputation (MI).</p> <p>Methods</p> <p>We performed simulations to validate the MI approach for estimating hazard ratios and the predictive ability of a model or of an additional variable in case-cohort surveys. As an illustration, we analyzed a case-cohort survey from the Three-City study to estimate the predictive ability of D-dimer plasma concentration on coronary heart disease (CHD) and on vascular dementia (VaD) risks.</p> <p>Results</p> <p>When the imputation model of the phase-2 variable was correctly specified, MI estimates of hazard ratios and predictive abilities were similar to those obtained with full data. When the imputation model was misspecified, MI could provide biased estimates of hazard ratios and predictive abilities. In the Three-City case-cohort study, elevated D-dimer levels increased the risk of VaD (hazard ratio for two consecutive tertiles = 1.69, 95%CI: 1.63-1.74). However, D-dimer levels did not improve the predictive ability of the model.</p> <p>Conclusions</p> <p>MI is a simple approach for analyzing case-cohort data and provides an easy evaluation of the predictive ability of a model or of an additional variable.</p

    Latent variables and structural equation models for longitudinal relationships: an illustration in nutritional epidemiology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of structural equation modeling and latent variables remains uncommon in epidemiology despite its potential usefulness. The latter was illustrated by studying cross-sectional and longitudinal relationships between eating behavior and adiposity, using four different indicators of fat mass.</p> <p>Methods</p> <p>Using data from a longitudinal community-based study, we fitted structural equation models including two latent variables (respectively baseline adiposity and adiposity change after 2 years of follow-up), each being defined, by the four following anthropometric measurement (respectively by their changes): body mass index, waist circumference, skinfold thickness and percent body fat. Latent adiposity variables were hypothesized to depend on a cognitive restraint score, calculated from answers to an eating-behavior questionnaire (TFEQ-18), either cross-sectionally or longitudinally.</p> <p>Results</p> <p>We found that high baseline adiposity was associated with a 2-year increase of the cognitive restraint score and no convincing relationship between baseline cognitive restraint and 2-year adiposity change could be established.</p> <p>Conclusions</p> <p>The latent variable modeling approach enabled presentation of synthetic results rather than separate regression models and detailed analysis of the causal effects of interest. In the general population, restrained eating appears to be an adaptive response of subjects prone to gaining weight more than as a risk factor for fat-mass increase.</p

    Analyse d'enquĂȘtes cas-cohorte par imputation multiple

    No full text
    International audienceLes estimateurs pondĂ©rĂ©s utilisĂ©s en analyse des Ă©tudes cas-cohorte sont parfois peu efficaces. Or, l'enquĂȘte cas-cohorte peut aussi ĂȘtre vue comme un cas particulier de donnĂ©es incomplĂštes et des mĂ©thodes d'analyse pour donnĂ©es incomplĂštes peuvent ĂȘtre pertinentes, en particulier, l'imputation multiple. Cette approche est basĂ©e sur la gĂ©nĂ©ration de plusieurs jeux plausibles de donnĂ©es complĂštes, prenant en compte l'incertitude sur les donnĂ©es manquantes. Si le modĂšle d'imputation est correctement dĂ©fini, l'estimateur de l'imputation multiple est non-biaisĂ©. Nous avons montrĂ© qu'un modĂšle d'imputation correct peut ĂȘtre estimĂ© Ă  partir des donnĂ©es complĂštes (cas et tĂ©moins) en utilisant la variable indicatrice des cas comme variable explicative. Nous avons simulĂ© des enquĂȘtes cas-cohorte dont les sous-cohortes Ă©taient sĂ©lectionnĂ©es par un tirage uniforme ou stratifiĂ©. L'imputation multiple et les estimateurs pondĂ©rĂ©s fournissaient des estimations non-biaisĂ©s. Les estimations de l'imputation multiple Ă©taient lĂ©gĂšrement plus prĂ©cises que celles obtenues par l'analyse pondĂ©rĂ©e. Pour les variables de phase-1, l'augmentation relative des Ă©cart-type de l'analyse pondĂ©rĂ©e par rapport Ă  l'imputation multiple variait de 8 Ă  39%. Pour les variables de phase-2, l'augmentation relative variait de 3 Ă  24%. Ainsi, l'imputation multiple, qui utilise toutes les donnĂ©es disponibles et fournit une approximation du maximum de l'estimateur de la vraisemblance partielle, est une bonne alternative Ă  l'estimateur pondĂ©rĂ©

    Modélisation des observations longitudinales incomplÚtes

    No full text
    LE KREMLIN-B.- PARIS 11-BU MĂ©d (940432101) / SudocPARIS-BIUP (751062107) / SudocSudocFranceF

    Multiple imputation analysis of case-cohort studies

    No full text
    International audienceThe usual methods for analyzing case-cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case-cohort studies first with completely simulated data and then with case-cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37% for phase-1 variables and from 5 to 19% for the phase 2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case cohort data sampled from complete cohorts showed that even when no strong predictor of the phase-2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase2 variable and slightly more precise than the weighted estimators for the phase-1 variables. However the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case-cohort data. Practically, we suggest building the analysis model using only the case cohort data and weighted estimators. Multiple imputation can eventually be used to reanalyze the data using the selected model in order to improve the precision of the results
    corecore